Data Classification Using Genetic Parallel Programming

نویسندگان

  • Sin Man Cheang
  • Kin-Hong Lee
  • Kwong-Sak Leung
چکیده

A novel Linear Genetic Programming (LGP) paradigm called Genetic Parallel Programming (GPP) has been proposed to evolve parallel programs based on a Multi-ALU Processor. It is found that GPP can evolve parallel programs for Data Classification problems. In this paper, five binary-class UCI Machine Learning Repository databases are used to test the effectiveness of the proposed GPP-classifier. The main advantages of employing GPP for data classification are: 1) speeding up evolutionary process by parallel hardware fitness evaluation; and 2) discovering parallel algorithms automatically. Experimental results show that the GPP-classifier evolves simple classification programs with good generalization performance. The accuracies of these evolved classifiers are comparable to other existing classification algorithms. Data Classification is a supervised learning process that learns a classifier from a training set. The learned classifier can be used to classify unseen data records. Lim et al. have performed a sophisticated study on 16 UCI Machine Learning Repository databases by 33 different data classification algorithms [1]. Their experimental results are used for comparison with the proposed GPP-classifier. A novel LGP paradigm – Genetic Parallel Programming (GPP) [2,3] is employed to learn data classifiers. In GPP, individual programs are represented in a sequence of parallel instructions. Each parallel instruction consists of multiple subinstructions in order to perform multiple operations in each processor clock cycle simultaneously. A parallel program is executed on a specially designed Multi-ALU Processor (MAP). The main purpose of this paper is to demonstrate that GPP can evolve data classifiers to solve real-world data classification problems. Experimental results show that GPP can evolve binary-class data classifiers with comparable generalization accuracy to the other 33 existing data classification methods presented in [1]. We adopt the 10-fold cross-validation method to estimate the classification error rate (CE) of the GPP-classifier. 10 training sets are used to learn 10 classifiers that are tested with their corresponding test sets to obtain 10 test set CE. The 10 test set CE are averaged to estimate the generalized CE. We measure the classification accuracy and the generalization performance. A good generalized classifier gives similar levels of performance on the training and test sets. Furthermore, the GPP-classifier has adopted three techniques to avoid overtraining: 1) limiting the size of genetic programs; 2) penalizing over-trained individual programs; and 3) monitoring generalization performance over the evolution. All experiments have been run on a software GPP-classifier system. It produces a parallel assembly program together with a correData Classification Using Genetic Parallel Programming 1919 spondent serialized C code segment. Table 1 below shows the best, average, and standard deviation (stddev) of training set CE and test set CE of 10 independent runs (10-fold cross-validation on each run). Table 1. Training set CE and test set CE of the GPP-classifier training set CE (%) test set CE (%) best average stddev best average stddev %∆CE bcw 2.7 2.9 0.09 3.5 3.9 0.29 25.6% bld 27.3 28.0 0.69 29.3 31.7 1.74 11.7% pid 22.5 22.7 0.11 23.7 24.5 0.42 7.3% hea 14.4 14.8 0.24 16.0 18.9 1.78 21.7% vot 3.9 4.1 0.10 4.1 4.6 0.23 10.8%

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Dimensionality Reduction and Improving the Performance of Automatic Modulation Classification using Genetic Programming (RESEARCH NOTE)

This paper shows how we can make advantage of using genetic programming in selection of suitable features for automatic modulation recognition. Automatic modulation recognition is one of the essential components of modern receivers. In this regard, selection of suitable features may significantly affect the performance of the process. Simulations were conducted with 5db and 10db SNRs. Test and ...

متن کامل

Parallel Genetic Algorithm Using Algorithmic Skeleton

Algorithmic skeleton has received attention as an efficient method of parallel programming in recent years. Using the method, the programmer can implement parallel programs easily. In this study, a set of efficient algorithmic skeletons is introduced for use in implementing parallel genetic algorithm (PGA).A performance modelis derived for each skeleton that makes the comparison of skeletons po...

متن کامل

Parallel Genetic Algorithm Using Algorithmic Skeleton

Algorithmic skeleton has received attention as an efficient method of parallel programming in recent years. Using the method, the programmer can implement parallel programs easily. In this study, a set of efficient algorithmic skeletons is introduced for use in implementing parallel genetic algorithm (PGA).A performance modelis derived for each skeleton that makes the comparison of skeletons po...

متن کامل

Bankruptcy Prediction: Dynamic Geometric Genetic Programming (DGGP) Approach

 In this paper, a new Dynamic Geometric Genetic Programming (DGGP) technique is applied to empirical analysis of financial ratios and bankruptcy prediction. Financial ratios are indeed desirable for prediction of corporate bankruptcy and identification of firms’ impending failure for investors, creditors, borrowing firms, and governments. By the time, several methods have been attempted in...

متن کامل

Fuzzy Programming for Parallel Machines Scheduling: Minimizing Weighted Tardiness/Earliness and Flow Time through Genetic Algorithm

Appropriate scheduling and sequencing of tasks on machines is one of the basic and significant problems that a shop or a factory manager encounters; this is why in recent decades extensive studies have been done on scheduling issues. One type of scheduling problems is just-in-time (JIT) scheduling and in this area, motivated by JIT manufacturing, this study investigates a mathematical model for...

متن کامل

Fuzzy Programming for Parallel Machines Scheduling: Minimizing Weighted Tardiness/Earliness and Flowtime through Genetic Algorithm

Appropriate scheduling and sequencing of tasks on machines is one of the basic and significant problems that a shop or a factory manager encounters with it, this is why in recent decades extensive researches have been done on scheduling issues. A type of scheduling problems is just-in-time (JIT) scheduling and in this area, motivated by JIT manufacturing, this study investigates a mathematical ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003